Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Word Segmentation Using Minimal Linguistic Knowledge

This paper presents a primarily data-driven Chinese word segmentation system and its performances on the closed track using two corpora at the first international Chinese word segmentation bakeoff. The system consists of a new words recognizer, a base segmentation algorithm, and procedures for combining single characters, suffixes, and checking segmentation consistencies.

متن کامل

Improving Word Alignment Quality Using Linguistic Knowledge

Word alignment of bilingual parallel corpora is usually generated using only statistical information. External linguistic information like e.g. a dictionary or linguistic structural annotation of the texts is used rarely, despite its usefulness. Additionally, it has to our knowledge never been examined systematically how linguistic information can be employed for word alignment improvement. In ...

متن کامل

Thoughts on Word and Sentence Segmentation in Thai

This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identific...

متن کامل

Collocation and Thai Word Segmentation

This paper presents another approach of Thai word segmentation, which is composed of two processes : syllable segmentation and syllable merging. Syllable segmentation is done on the basis of trigram statistics. Syllable merging is done on the basis of collocation between syllables. We argue that many of word segmentation ambiguities can be resolved at the level of syllable segmentation. Since a...

متن کامل

Thai Word Segmentation Verification Tool

Since Thai has no explicit word boundary, word segmentation is the first thing to do before developing any Thai NLP applications. In order to create large Thai word-segmented corpora to train a word segmentation model, an efficient verification tool is needed to help linguists work more conveniently to check the accuracy and consistency of the corpora. This paper proposes Thai Word Segmentation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEICE Transactions on Information and Systems

سال: 2018

ISSN: 0916-8532,1745-1361

DOI: 10.1587/transinf.2018edp7016